Skip to content

Multi-model deploy and test with wrapper#1014

Merged
asm582 merged 3 commits intomainfrom
multi-model-indp
Apr 15, 2026
Merged

Multi-model deploy and test with wrapper#1014
asm582 merged 3 commits intomainfrom
multi-model-indp

Conversation

@asm582
Copy link
Copy Markdown
Collaborator

@asm582 asm582 commented Apr 15, 2026

This PR introduces a new design for installing multi-model infra via a wrapper around the current infra install scripts. This avoids changing the existing e2e infrastructure that runs several other tests. It also contains a separate make command to run multi-model tests that are in sync with existing single-model benchmark tests. Gemini was used to help with coding.
This is mostly dormant code at this point that has not been connected to CI. The following commands can help run this PR in a namespace:

abhishekmalvankar@wecm-9-67-159-78 llm-d-workload-variant-autoscaler % make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

@asm582
Copy link
Copy Markdown
Collaborator Author

asm582 commented Apr 15, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@asm582 asm582 changed the title Multi-model code with wrapper Multi-model deploy and test with wrapper Apr 15, 2026
@asm582 asm582 requested review from kahilam and lionelvillard April 15, 2026 13:41
@asm582 asm582 closed this Apr 15, 2026
@asm582 asm582 reopened this Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 39 11
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@lionelvillard
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ❌

Insufficient GPUs to run OpenShift E2E. Re-run with /retest (OpenShift E2E) when GPUs free up.

Resource Total Allocated Available
GPUs 50 49 1
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 39 11
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 40 10
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@lionelvillard
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 39 11
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

Comment thread deploy/install-multi-model.sh
Copy link
Copy Markdown
Collaborator

@kahilam kahilam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my previous review for now.

@asm582 asm582 merged commit 036dc2a into main Apr 15, 2026
17 checks passed
@asm582 asm582 deleted the multi-model-indp branch April 15, 2026 17:32
kahilam added a commit that referenced this pull request Apr 15, 2026
Addresses review feedback from #1014 to move away from bash deployment
scripts for readability, type safety, and concurrent model deployment.

Key improvements:
- Models 2..N deploy concurrently via goroutines (bash was sequential)
- Connectivity verification uses kubectl port-forward from the Go
  process, eliminating the in-cluster curl Job and its Docker Hub image
  (curlimages/curl:latest)
- Kubernetes resources (Gateway, HTTPRoute) created via dynamic client
  instead of heredoc YAML
- Proper error handling and structured logging

The Go tool is invoked via `go run ./deploy/multimodel` from the same
Makefile targets (deploy-multi-model-infra, undeploy-multi-model-infra).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants